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ABSTRACT^ A theory of early and intermediate visual inform&tiwi processing is 
given* which extends to about the level of li sure-ground separation. It; core is a 
ebmputational theory of texture vision. Evidence obtained from perceptual and 
from computational experiments is adduced in its support A consequence of the 
theory is that higli -level knowledge about the world influences visual processing 
later pod in ft different way (ram that Currently practiced in machine vision. 
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Summary 

Understanding how the visual cortex analyzes natural images la 
one goal of v isual neurophysiology. At same stage, we need to confront the 
Information processing problems that are involved. A series of computational 
experiments on natural images was therefore undertaken, and a visual pre- 
processor emerged with the I all swing structure : 

(1) Appraxinnatians to the first and second directional derivatives of intensity are 

measured everywhere, Thay are computed by convolving the image with "edge- 

shapedf and "bar-shaped" masks. 

(2J These measurements are parsed into an orientation-dependent description of 

the intensity changes present in the image. The parsing process consists of 

discovering and matching peaks sod troughs in the measurements,, and roughly 

classifying focal patterns of peaks into EDGES, LINE?, SHADING, etc 

(3) The descriptions obtained at oach orientation are combined, termination points 

of edges are discovered and small blobs are isolated and described. 

This pre-proces&or computes what is called the primal sketch of 
en image, but for most images it is large and unwieldy. By examining our ability to 
interpret certain simple drawings, it is demonstrated that a variety ol abstract 
grouping processes and related facilities are present in our visual systems. It is 
shown how, if applied to the primal sketch, these processes are capable of 
success fu% analyzing many kinds of visual texture, and of extracting perceived 
"figure" from eround. It is conjectured that these operations can accounl for the 
entire range of texture discriminations of which we are capable, and the analysis 
of several real images is given in its support. The conjecture relegates the 
influence of higher-level knowledge an visual processing to a much later stage than 
is currently found in machine vision programs, and it implies that such knowledge 
should influence the control of 3 rather than the actual computations ii\ the earlier 
stages of analysisi 



Preface 

Tha work of Barlow (1 953), of Mounlcastla ( ] 957), of LetMn at 

■L ( I 959), and of Hubel and Wiesel (1962) initiated what is widely regarded as 7 
breakthrough in v.sual neurophysiology, Bui despite tha subsequent accumulation 
of a wealth of anatomical and physiological information about the mammalian visual 
«>rtex f our knowledge of i Is information processing function, or even of how 
difficult are tha problems that it sofves, remans rudimentary. 

This is no accident. Physiology has always been concerned with 
how organisms work. Its pale are to unravel th* local mechanisms within an 
or^niem and to under-Bland their place in (fie functioning ol the animal as a whole 
While the concerns pf Dhysiolpgy lay uilh neciaiical, c even with chemical or 
physical phenomena, the physicist's background knowledge and everyday 
experience sufficed to provide him with the necessary insight info function. As 
physiology has turned to information processing problems, however, 
neurophyeiologists ha^e lost the reliable background intuition that has bean 
fundamental to the success of the discipline in the past. The situation in modern 
neurephysidogy is. I hat people are trying to understand how a particular 
mechanism performs a computation lhal they cannot aver formulate, let alone 
provide m crisp summary of ways of aoing. To rectify the situation, we need to 
nvest considerable, effort in studying tha computational background lo questions 
that can be approached in neurophysiologies! experiments. 

Therefore, although the work described here arises from a deep 
commitment to the goals of neurophysiology, the work is not abqut neurophysiology 
directly, nor is it about sinnia(ing neurophysiologies! mechanisms; it is about 
studying vision. U amounts to a series of computational experiments, inspired in 
part by some findings in visual neurophysiology. The need lor them eriaea 
because, until one tries to process an image or to make an artificial arm thread a 
needie, one has tittle idea of the problems thai really arise in trying to do these 
things Computational experiments allow one to study in detail what combination of 
factors causes a method, or group of methods, to succeed or fail in a number of 
particular circumstances that originate from real-world data. Tha power of this 
approach is that the knowledge ana obtains concerns facte that are inherent in the 
task, not in tha structural details of the mechanism perlormlng it, Such knowledge 
is a vital prerequisite for understanding mammalian visual systems fully, and it is 
knowledge lhat cannot be obtained in any other way. 



Introduction 

The vision problem begins with a large gray-level intensity array, 
and culminates in a description lhat depends an that array, and on the purpose for 
which it i* being viewed. The question of interest h what has to go on in 
between. In this article, We shall restrict our attention to single frame, 
mcnMhrc-mslic, menocutar images without speeularities, reflections, transfucency 
transparency or Fight Purees; and we shall sludy noma of the problems that arise 
in understanding early and intermediate levels of visual information processing. 

Perhaps the best way oF introducing the topic is to pose some 
questions 

(1) Whet is early visual processing for? 

(2) How much of ^i&ual information processing can proceed using purely data- 
driven techniques? 

(3> At what leveF and by what mechanise q^y tenure ^^ m arid figure-around 
phenomena be implemented? 

<4) When does higher level knowledge about the world have to begin interact™ 
with pureFy data-driven processes? 

(£>) When and. how does purpose have tn influence whet computations are made 
oo an image? 

Recent work in computer vision has tried to involve high-level 
knowledge about the world at a very early stage in the processing (Shirai 1974, 
Freuder 1975). The main motivation* lor this have been that it has proved very 
difficult to extract object boundaries from intensity arrays, and that strategic 
deployment of high-level knowledge about a scene can sometimes greatly reduce 
the computationaf effort required far primary image processing. This article 
opposes this trend, and makes throe main arguments The lirst argument consists 
Of a demonstration that b very great deal of information may in fact be extracted 
from an image using knowJedge-free techniques. The price one pays for this is 
prodigious computing power, end it involves programs lhat are considerably more 
complex than feature-pornl detecting routines. There can, however, be little doubt 
that our own visual systems do in fact possess enormous power (Thomas and - 
Binford 1974, p 16}. The second argument is that deciding what a low -level visual 
Processor ftn and cannot deliver is & pre-raqulsile for useful research into 
higher-level" problems of recognition For example, the problem of recognizing 
and interpreting a scene has a very different flavor In vision systems with rich and 
with pour pre-processing abilities. The difference is almost as extreme as trying 
to make sense out of an English sentence with and without tho benefit of a 
knowledge of English syntax. Hence, unless one has a firm idee about what pre- 



processing is possible, one is in danger of expending effort on problems lhat, in a 
■real sense, are not problems at all. The third argument is that our own perceptual 
apparatus probably contains a rich pre-processing ability. Hence if machine vision 
Intends to say anything useful about those computations, it had better examine the 
lower problems first, and study the later ones when the peripheral processing has. 
been solved. Otherwise one is conducting research without the benelit ol date en 
which to test one 1 ? condus ions. This amounts La i reckless abandoning of 
precisely the new experimental tools that computer technology has made available, 
namely the ability to decide whether a computational theory successfully 
addresses the problems that arise in real-world data. 

This article presents a theory ol visual processing for Its chosen 
class of images up to about the level ol the figure -ground problem. Its main facui 
Is a new computational theory of texture vision The article gives a sufficient 
number of examples ol processed images to establish that the theory is not 
obviously inadequate. The detailed and lengthy arguments that make a positive 
case for adequacy will appear elsewhere (Marr 197G), The argument is quite 
protracted, and relies on several main steps. Its overall thrust is that the first 
step of consequence in visual information processing is to compute a primal 
description of the image^and that all subsequent compulations are implemented a* 
manipulations of that description, In order that the reader may follow with ease 
the stages in the argument, I summarize the main steps here: 

i,l) The function of early visual processing is to compute a description of the 
gray-level changes present in an image in terms of a vocabulary of gray-level 
change primitives. These primitives consist of straight contour segments of 
various kinds (SHADING-EDGE, EXTENDED-EDGE, etc.), LINEs, BLOB?, md of variooa 
parameters bound to them such as FUZZSNEES,. CONTRAST or LIGHTNESS, POSITION, 
ORIENTATION, simple measures of their SIZE, and a specification of their 
TERMINATION points. This primitive description is obtained I rem the intensity 
arrey by knowledge- free techniques, and it is called the PRIMAL SKETCH. It 
differs from an array of feature points in a subtle way, which is explained in the 
text 

(2) From our ability to interpret drawings, one may infer the presence in our 
perceptual equipment of symbolic processes that are capable of grouping lines, 

points, and biob9 together in various ways. Nan- symbolic techniques, like 
examining the power spectrum ol the spatial Fourier transform of the drawing*, 
cannot account for these grouping: phenomena^ since the groupings are performed 
by mechanisms of construction rather than mechanisms of detection. 

<3> Per most images, the primal sketch is large and unwieldy. It can however be 
capably analyzed by a mechanism lhat has available the symbolic processes 



discovered in step {2), blether with the ability to select Items out of the primal 

sketch on the basis of first-order diccrimnalic-ns acting on the principal 
parameters, Hence, it is argued, texture vision rests on grouping operations and! 
first-order discriminations operating on the primal sketch, rather than on second 
order operations operating on the intensity array as suggested by Julesz (1975). 
It is further argued that tha set of processes whose existence is necessary In 
order to explain our ability to interpret drawings, is also sufficient, when applied 
to the primal sketch, to- explain the range of texture vision that is present in 
humans* Fourier and power-spectrum techniques on their own are certainly 
deficient, and probably also unnecessary. 

(4) The extraction of a form from Ihe primal sketch using these technique! 
amounts to the figure -ground computation. Except in difficult cases, this extraction 
can proceed successfully without calling upon higher level knowledge-, and it 
precedes the computation of the shape of the extracted form This has two 
import ant consequences Firitly, the isolation and delivery of e form to 
subsequent processes does not depend on being able to assign en accurate high* 
level description to it; and secondly, because of this it is easy to compute: rough 
descriptions of complex forms. This is probably essential for the fluency of 
subsequent analysis of shape. 

(5} The extent to which higher level knowledge and purpose influences Ihe 
processing up to this stage is very limited. There Is el present no reason to 
believe that nrgher, level knowledge is needed to Compute the primal sketch at alii 
and its role in the extraction of form from th# prime! sketch can often be limited to 
deciding which form should he extracted. It is conjectured that in all cases, higher- 
level knowledge need be only weakly coupled to the processes that separated 
figure end ground, This relegates the use of higher level knowledge to a much 
later stage than is Iflund in current machine vision programs, and simultaneously 
confines much of its impact to influencing control, rather than Interfering with the 
actual data-processing that is taking place ower down. 

Each step In the argument Is treated tn a separate section. 

Early Processing; computing the primal sketch 

The- primal sketch consists of a primitive but rich description of 
the Intensity changes that are present In an image. This description consists of a 
set of assertions, expressed in terms of a vocabulary of symbols and modi Hera 
that are powerful enough to capture all of the invariant information in an intensity 
array. An axar^ple of such an assertion might b*3 



{SHADING-EDGE (POSITION (34 48) (73 4B» 

(CONTRAST 34) 
(FUZZIUESS 17) 
IORIENTATION 0)) 

The first problem is how such an assertion may be computed — what 
measurements should one first make on an image, and haw should those 
measurements be combined to enable the assertion to be made. 

To help us answer these question*, let us see what 
neurophysiology talis us. Simple cells in the cat make measurements upon an 
image, and the nature of the measurement that they make is fairly well understood, 
Their receptive fields are either bar- or edge-shaped (HubeS and Wiesel 1962), 
and if other parameters are held constant, they signal the linear convolution of a 
bar- or edge-shaped mask with the intensity distribution currently falling upon the 
retina, in logarithmic units of contrast (Maffei ard. Fjorentini 1973, figure &}. Not 
all of what are now called simple cells behave uneaHy, but a distinct subclass 
does. The important question for understanding the analysis ot visual information 
is whether these cells rep-resent assertions Other than the fact of the 
measurement itself; and it ihey do, what are they? One idea is, for example, that 
a cell with a bar-shaped receptive field signals an assertion about the presence of 
a bar in the visual field; but a moment's thought reveals that this is impossible, 
since such cells respond also to the presence of a single edge. Another puzzle 
concerns the existence of both bar-shaped and edge-shaped receptive fields (in 
different cells). Since both kinds detect changes in intensity, why are both type* 
needed? The reason is probably that changes in intensity are not the only 
important types oi change in an image — changes in intensity gradient often 
provide important,, and sometimes the only information that an object boundary is 
present (Marr 1974b). An edge that consists of a step change in intensity gradient 
rather than in intensity may be produced by a lamberiian white cube aligned at 45 
degrees to the viewer and illuminated from the viewing position. Perceptual 
evidence of our sensitivity to such edges is easy la find: Mach Bands are the 
most well-known example (see e.g., Ratliff 1965). This immediately suggests that 
one should regard simple telle that have an ed^e-shaped receptive field as 
measuring something like the IJrst directional derivative of intensity! and those with 
a bar-shaped receptive field as measuring the second directional derivative. Two 
questions then ar se: firstly, why compute direction rteasures? And secondly, 
what should one do with the measurements when one has them? 

Tne application of a bar-shaped mask to an Image does not, as W* 
have seen, lead directly to an assertion cbouL the presence ot a bar in the image. 
The underlying point concerns the relation between computing the bar assertion, 
and the inverse transform of the original measurement, and it is a point ol some 
importance Let us consider the computation of an assertion about the presence of 
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I. Tho image ot a chair (la) has been convolved with two "corner -masks" {lb and 
Ic). The mask shapes are shown in the figures. Delecting tcrners from such 
measurements is not straight (orwarej. 



a corner in the image of figure la. A way of computing this assertion that 
immediately springs to mind it to take p specially "tuned" corner- shaped mask One 
might conjecture that a "corner" exists in the imago at a point P provided that the 
mask gives a value there which is greater than soma threshold. Figures lb and c 
show the convolution of corner masks with the image; but can the reader 
confidently distinguish the comers from these measurements? The reason for the 
failure is that the inverse transform to that produced by a corner -shaped 
receptive field depends critically on the boundary conditions that obtain. Any 
method that computes a corner assertion is saying- something about this inverse, 
and so must tike enough information into account at each point to satisfy the 
dependence on boundary conditions. This extra information may be provided by 
looking at the results of the corner-mask at neighboring points, or by looking at 
the results of some other measurement taken in parallel] the important point is 
that the computation is not a trivial one, and has to lake these extra factors into 
account It Is not impossible to use primary measurements that are not orientation 
sensitive, but the extra computation involv&d is expensive, since one switches 
from having to look in just two directions to having to look In all directions, A 
persuasive case would have to be made if one were to choose a primary 
measurement that was not directional!? selective, 

Translating the measurements into a description 

Suppose then that one measures the first and second directional 
derivatives of intensity everywhere in an image. What do we do with them? 
Translating one large array of numbers Into several other large arrays is net an 
obviously useful process. It turns out, however, that we can make a great 
simplification at this sta|e in the analysis. Provided that measurements are made 
with masks of several sizes h one can show that the positions and sizes of the 
peaks in the measurements provide enough information to compute the description 
of the underlying intensity changes. Furthermore, provided that a group of peakt 
is sufficiently isolated from other peaks, the other peaks may be ignored when 
analyzing that group, 

The reason for this is illustrated in figure 2, which shows the 
difference between ed|e-mask values obtained using masks ol two different sliee 
on a step change in intensity (2a>, and on a gradual change (2b), The results are 
analogous to'the power spectra of different kinds of edge. Step changes ere 
"seen" equally well by ail sizes of mask. Gradual changes sr& seen increasingly 
faintly by edge-shaped masks whose dimensions ere smeller than the distance over 
which the intensity change is taking place. Figure 2c shows this effect in graphic 
form, and from it one can see that a good estimate of the "fuzziness" of an edge 
may be made by finding the mask size at which the edge-mask response starts to 



FIGURE 2 



2. Digrams of "edge- shaped" mask convolutions with a step (a) and with a gradual 
{bl intensity change. The intensity prolilas appear at the (op. The convolutions 
with the two sixes of mask shown on the left appear beneath Lhe in tensity 
profiles. For a step change in inlen&ity, masks of all sizes produce the same 
maximum response (trace a in graph (cj). Gradual intensity changes are seen 
progressively weaker by the entailer masks (trace b in graphic)). 
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This shows one way in which the use of multiple mask sizes is 
important, but there it another reason which is perhaps even more important. It la 
thai where a faint edge exists in the image, it li frequently impossible to tall from 
a single record which of the peaks are important, and which are due to noise, 
Middling peaks obtained using differed siies of mask greatly aids the separation 
of signal from noise. 

Th& process of computing the description may therefore be 
reduced to three operations: firstly, find the peaks in the measurements obtained 
from the convolutions of the imago with different sizes of mask, and select the 
relevant peaks using the criterion illustrated in figure 2; secondly, separata the 
peaks into isolated groups and thirdly, parse- the local configuration of peeks into a 
descriptive element. A small number of classes of peak configuration suffices to 
cover Ihe cases that can actually occur, end they are illustrated in ligure 3. The 
figure shows typical combinations ot peak patterns that occur in the outputs from 
edge-mask {upper records) and (rem bar- mask {lower records) convolutions. 
Example* of the masks that we use appear in figure 3a. The descriptor EDGE la 
used when two peaks of about equal and opposite signs occur together in the bar- 
mask record Ob>. \f one bar-mask peak is considerably smaller than the other, the 
edge is classified as an EXTENDED- EDGE (3c). Extended- edges are common where 
a convex boundary is illuminated From one side. Figure 3d shows an intensity 
gradient edge r and figure 3a correspond? to the presence of a thin LINE such fit 
cen occur in the glare ofl an object's edge, or a very thin pencil stroke. Finally 
there are edges that begin end end gradually, and extend over a relatively large 
distance; these are classified as SHADING-EDGEs (figure 31)- In addition to 
descriptors of edge type, one can measure an edge's STRENGTH, POSIT&GN, 
ORIENTATION, and FUZZINESS- This last parameter is computed by comparing" the 
amplitudes of the peaks obtained using -masks of the same shape but different 
sizes. (See figure 2, and Marr (I 974b) for the details), 

Figure 4 gives an example of an intensity distribution that bes 
been described by this process,, and the legend explains which mask convolutions 
were used. One o! the assertions has bean traced back to the convolution 
profiles, and the arrows point to the peaks that gave rise to that particular 
assertion. The low-level voca'ou'ary that is used here is not intended to be 
definitive, but seme claim is made to the affect that It Is a good example of the 
genre, because it has sufficient expressive power to describe most kinds of 
shading adequately, and the method is simple and works reasonably well, 
Experiments are being planned to determine whether the types of intensity change 
that are distinguished by these primitives are also perceptually distinct. 
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a Example* of edge- and bar- masks appear in 3a 3b - f give the classification 
thai is described i r* ibe text of peak patterns in edge- and bar-mack convolution 
profiles. The primary visual processor use* these stereotypes to classify intensity 
changes in an image, 



FIGJRE 4 



4. The intensity distribution exhibited in 4a, whate profile appear* in 4b, was 

obtained by illuminating a curved piece o! white paper trcm one end, and viewing 

it from above. Its descripl on, computed using an edge-mask of panel-width S (4e), 

and bar-masks of panel -widths 4 (4dl and d (4e) f is as follows; 

EDGE {POSITION 180) (AMOUNT 136) (FUZZ SHARP) 

EDGE (POSITION 312) (AMOUNT 3) (FUZZ 4) 

EDGE {POSITION 332) (AMOUNT 2) (FUZZ SHARP) 

EDGE {POSITION 535) (AMOUNT -3) {FUZZ 4) 

EDGE (POSITION 544) {AMOUNT 2S) {FUZZ 5) 

EDGE (POSITION 5£4> (AMOUNT 2) {FUZZ 4) 

EDGE (POSITION 590} (AMOUNT 1) (FUZZ 4) 

EXTENDED- EDGE {POSITION 682) (AMOUNT -12) (FUZZ 9) 

(the peaks giving rise to this edge are marked with arrows) 
EDGE (POSITION 7 2^ {AMOUNT -20) {FUZZ 6) 
EDGE (POSITION 776) {AMOUNT 3) {FUZZ 4) 
£CG" (POSITION 7S4J (AMOUNT -4) (FUZZ 4) 
SHADING- EDGE {POSITION £70) (AMOUNT -14) {WIDTH 67) 
SHADING- EOGE (POSITION 491} (AMOUNT 4) {WIDTH 36) 
SHADING-EDGE {POSITION 43 9J (AMOUNT -B) (WIDTH 73) 
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FIGURE 5 



5. After description of intensity changes hp? occurred independently at each of S 
orientations, and atler linear assembly of these descriptions hac taken place, 
locally, ihe eight descriptions are combined. An example of the result obtained 
from 5a appears m 5b. Short noise elimination then takes place h giving 5c The 
asterisks denote places- at which, directional measures of contrast suddenly change. 
They are the precursors of termination assertion*. 
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Combining d Mentation-dependent descriptions 

We have seen ho^ to compute an orientation- dependent 
description of the intensity change*, and we now deal with the; problems of 
combining local pieces of description from the Game orientation, and of combining 
the descriptions obtained at different orientations, What then are the issue* that 
are raised in combining the Local analyses described in the previous section? 

The information that is used during this operation is primarily of 
two kinds; local consistency relations, which enable one to string focal assertions 
together; and local competition, between competing descriptions of the same 
phenomenon obtained from masks at different orientations. Surprisingly, It turn* 
out that the Local consistency relations are more Important than Local competition, 
and that local competition is required not so much between descriptions obtained 
from masks at nearly adjacent orientations, but between the descriptions obtained 
from masks that are nearly perpendicular. 

Figure 5 illustrates the problems that arise. The image was first 
operated on at eight orientations with the process described in the last section. 
Next, these local assertions have been gHued along directions nearly parallel to the 
masks from which they were obtained. An interesting' feature of the process is the 
abundance of short segments perpendicular to the primary edge {figure 5b), These 
arise because of a combination of local noise d the image tesselalion, and other 
irregularities in the image. They occur in every image we have processed, In 
dealing with them, one cannot dismiss in a cavalier manner all very short segment*: 
tiny "blobs" in the image also give rise to them, os can be seen from the same 
image at coordinate (73, 75) . Eu' a "smal" element like this can be ignored if (a) 
it crosses a "long" element, and (b) its contrast is less than that of the item it 
crosses. Figure 5c shows t'ne results of removing small noise elements using this 
criterion. 

The asterisks in the figure* signify that the contrast of the edge 
changes rapidly at that point, possibly becoming zero. They are the precursor 1 of 
assertions about the presence of terminations, but space forbids a discussion of 
them here (see Marr 197flc). 

The only other item oF note In computing the primal sketch is the: 
question of detecting local, small blobs. Figure 5c at coordinate {73, 75) shows 
how they appear, and in fact we make email blobs a primitive element Of the 
primal sketch, together with their associated "intensity 1 " value, and the sizes and 
orientations of their major and minor axes. Finding these blobs Irom the glued 
assertions depends a small amount on elegant programniing, and a large amount on 
brute force. The reader may ask why do we detect blobs in this way; why not 
use a simple blob-detector like a mask with a centre-surround organization? The 
reasons are twofold. Firstly, when using e centre -surround mask to generate 
assertions, one has to be very careful of the boundary condition problem 
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mentioned earlier, One can devise parallel scheme? of the lorm "a bleb exists at 
points P if the centrE'sgrruund mask gives an isolated peak there, and il there are. 
no edges in the vicinity," but these ere relatively expensive lo compute, end 
become unreliable if the blob is not very circular, or if there are indeed other, 
fainter or unrelated edges in the vicinity. It Is Interesting In this connection to 
note that the phoaphenes produced by simulating a point in area 17 — an act 
whicht presumably stimulates orientation- sensitive cells at all orientations — 
commonly take the form of a bright point in the visual field {Brindley 1 970, p 124). 

The primal sketch differs from a simple feature-point array In a 
rather subtle way, and as a modal ot the information-processing that is performed 
in area I 7 it ma'<aE some definite and perhaps unexpected statements. Some 
examples will help to make this deaf. One consequence Is thai the direct output 
of a linear simple cell is not available as *n element in the primal Sketch- Its 
measurement is Used to create an assertion about the presence of en edge, and 
that assertion is what is available Creatine; the assertion is an act of computation 
— a simple one, since It Involves little more than peak matching and the 
classification of a peak configuration, but an act of computation nonetheless. The 
main point Is that this has to go on. 

An interesting consequence ot this is illustratBd in figure 6, 
Suppose that an image contains two small close Webs, These blobs give rise to 
measurements by a number of sizes of mask -- some email ones represented by 
the tiny line segments, and some large ones, like the one that is illustrated. One'f 
§U?rjojl inclination would be to believe that large "line-detector" would lire, and 
that this would have something to do with seeing the two blobs-. This view 
amounts to supposing that simple cells write directly into a feature-point array. 
But if our theory is correct, although the large "simple celf may indeed fire, its 
measurement will not be used to compute the description cf the two blobs 
because their sharp boundaries cause the associated intensity change to be 
described from peak* in the email masks, The el feci illustrated in figure 2c will 
cause the description to be computed from the smaller masks unless the blobs are 
severely defoeussed. [Compare a£so our failure to perceive L. D. Harmon's coarsely 
sampled and quantized image of Abraham Lincoln, (JuFesz 1371, p.31 1)]. I mention 
this point because Julesz <197S, pp40-4£} has concluded that in situations like 
this one, the output of large simple cells in this configuration plays no part in ■ 
texture vision discriminations. We shell see the relevance of this shortly.. 

The structure of the primal sketch may be summarized as follows; 

PSl. The primary visual processor delivers a syrv.bolic description of Ihe intensity 
changes present in an image. This description uses the following primitives to 
describe intensity charges: 

{i} Various types ol EDGE 



(ill LINEs, or thin BAR* 
(iii) BLQBs 
The items (i) and (ii) have been assembled into strai tfit segments, end short noise 

elimination has occurred. 

PS2. The following items are bound to eacfl element of the description. 
(i> ORIENTATION 

(ii) SIZE - length and width if both are defined, diameter if 
major and minor axes ere equal or undefined, 
(iii) INTENSITY (LIGHTNESS). 
(Iv) POSITION. 
{v} TEfiMINAT(OM POINTS, 



What drawings tell us 

In order to make the second slop of my argument. I must digram* 
awhile on the manifest variety <?t ways in which we can interpret simple pendl 
drawing? that lack semantic content TTie point I wish to make is that from our 
ability to interpret certain kind? of drawings, we ran infer with some confidence 
that certain kinds of symbolic process must exrst in our visual systems. Let us 
take an extreme example lirst. In figure 7a there is little doubt that some process 
somewhere is creating a circular contour, and that the "places" in the image that 
are giving rise to that contour are the inner ends of the radial lines. One cannot 
argue that Fourier detection method* will produce it for one, because It really ii 
not there, This contour is not being detected, it is being constructed. Figure 7b 
shows another example in which "ends ol things" are being formed into a 
perceptually vivid contour. 

From these two rather strong examples, we see that abstractly 
defined places in an image can be assembled into contours that have a definite 
perceptual existence, despite the absence of apparent semantic content In tha 
image, If one approaches these phenomena from a computational point of view, It 
Is natural to think of this process as occurring in two steps. Firstly, certain things 
in drawings can causa "plwes" to be defined in some abstract sense. Secondly, 
"places", once definedj can be aggregated in various ways. 

Having realized this, one immediately wants to know in whet ways 
places actually can be defined, and in how many different ways they can be 
aggregated. A better feel for the problem can be gained by looking at the rest of 
figure 7, and at figure fi. We are forced to conclude that "planes" may carry 
intrinsic orientation information, and that this orientation information may of may 
not be 'used (figures 84 and 7c). indeed these two situations con occur in the 
seme figure (7a). 



FIGURE 6 







6. Tile difference between the primal sketch and a feature-point is brought out by 
the image 6a. A measurement taken with a large mask <Gb) could generate a 
feature-point, bul it would not be used in the computation of the primal sketch. 
This 1$ because the Eharp contrast changes fore* the use of measurements frofli 
small masks {60- 
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7. These drawings provide evidence for the action of several symbol ie processes 
during our perception of them. In particdar, the circular -contour" in 7b, and the 
linear one in 7b, are beinj construe! edj not delected. 



We see Stem these exam^es thai the aggregation of places can 
occur in two broad waysr cindering into groups that often have computable 
bound ariesj and the assembling of places into curves or fines, which I call 
curvilinear aggregation. In the case where there is an orientation associated with 
the place, aggregation can either use or pgnore it. If the orientation la used, there 
are two possible ways: the aggregation can either follow the intrinsic orientation, 
or it can proceed in a fixed orientation relitive to it (figure 9c). If the number of 
places involved is very small (less than 5 say) h the place? may form a standard, 
named configuration (see ligure 9) which is evidently described relative to an axis 
which is imposed on the figure, and whose default value is the vertical. 

Interestingly, procedures lor implementing each aggregation 
technique are quite straightforward. They have a common flavor; a mixture of a 
simple local process operating everywhere over the image, together with a 
sensitivity t 0j and the ability to generate one or two straightforward global 
measures, To give you an idea of their simplicity, I shall outline one of then% which 
we call theta-aggregation. Theta- aggregation is the process by which oriented 
items pre aggregated In a direction that differs from their Intrinsic orientation- The 
difficult part about it arises because measures of the "overlap" of two oriented 
items depends upon the angle, that a, that the final aggregate makes with each local 
unit (see figure 10). So that* determine* the aggregation process, but also 
depends upon it. For good data, it may be quite unnecessary to know theta; place 
aggregation that Ignores theta will suffice to compute the aggregate. In general, 
however, one will need to take theta into account, as wb shall shortly see. 
Viewed from a very abstract jevet, this compulation may be regarded as a process 
of solving- a large number of rather simple equations. In practice, a network with 
feed-back will solve it, where the information being fed back is theta. Wb have 
implemented a,n iterative version of this process, and some results are displayed 
later on, 

In summary thers the argument of this section has been that our 
ability to interpret Certain simple drawings shows that we can bring certain highly 
symbolic processes to bear on the analysis ol drawings whose semantic content I* 
small, f summarize the processes thai appear to be available belowj even though 
space has not permitted mention of several of them. 

PLACES may be defined by: 

(Pi) The position of a blob, or of an edge or fine that ie not too long, 

(P2) The end of an edge or line that is not too short, or of a blob with long major 

axis and short minor axis, 

(P3) A small aggregation of places. 

The definition is slightly recursive.' This is to be expected, since the assertions 



produced by one aggregation process are presumably written into the same active 
geometrically organized storage processor as Is the primal sketch. The precise 
boundary between "too tong" and "ton short" car be le!t to Individual taste, 
because near it, both definitions- will usually lead to the same aggregations. The 
boundary needs to bo in the region of 0.5 to 1 degrees of arc at fa veal resolution, 

AGGREGATION may proceed in the following ways: 

(1) Clustering nearby places, using the methods about as complex as Bl or B2 of 
Jardina # Sibson {1971), but which are sensitive to global parameters of size and 
average density. Clustering facilities that appear to have about this complexity can 
operate on patterns of dots in most human visual systems {see e.g. Julesz U 971 
pp L05ffJ, or recently Q'Catlafhan (1974b)). 

{2) Curvilinear aggregation: aggregation that has a (local) orientation, and which 
produces contours by joining nearby, aligned places. It is probable that only first 
and second nearest neighbors need be considered by the local components of 
these processes, but some global information is also generated and used [see 
CTCaNaghan (I 974a and b) (or access to recent literature on dot- grouping studies, 
and Marr (1975)]. 

(3) Theta-aggregatien, the grouping of local, similarly oriented items in a direction 
that differs from the intrinsic orientalion, but in a manner which uses it 
{4) If the number of places is small (< 5) r the configuration formed by the pieces 
may be described relative to some epecif ied axis by means of a special 
configuration datastructura (See Marr 1376). 



Global Measures on the Primal Sketch 

Before the digression of the last section, we had reached the 
point of defining the Primal Sketch, and of showing how to compute most of the 
quantities in it. We also saw the primal sketch of a very straightforward image h of 
a cylinder, The primal sketch is rarely as sirtale a? that, however. Figures 12 and 
13 contain examples of the primal sketches of more complex images, and, as one 
might expect, they are in general large and unwieldy collections ol data. 
Furthermore, it is difficult to see how the complexity of the primal sketch could be 
en artifact of our particular choice of primitives: images really are complex in thJ* 
way • 

The unwieldy nature of the primal sketch is therefore something 
with which we have to live* and turn to our advantage if possible. The 
fundamental problem of the next stage of the analysis is simply stated: how do 
we select out from the primal sketch those regions that should be treated as unit 
forms by subsequent descriptive processes; and is it possible to do this without 
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B. These drawings exhibit aggregation practises thai take some account of the 
orientation present at the aggregated places. 



complex interactions between the primal sketch and higher-level knowledge? In 
perceptual teems, the computational pr&blenr that we must now address 
corresponds to distinguishing between figure and pound, Bird it is strongly related 
to the problem of texture virion (J Jew 1971}. 

From an abstract paint of view, Ins primal sketch is simply a 
large body of data. There is Iheref&re no difficulty in extracts from it certain 
simple global measures and statistics, to particular, we shall assume that the 
following measures are automatically available from any p-imal sketch; 

MEASURES taken over moderately sized regions (0.5 to 1.0 degrees at foveal 
resolution} of the primal sketch; 

MO. The total amount of contour, and number of blobs, at different contrast* and 

intensities. 

Ml. OfllENTAHQN: the total number of elements at each orientation, and the total 
contour length at each orientation - the orientations being divided into about 12 
dlachminable buckets. Detection of the existence of one, two, or three 
predominant orientation^ and the recognition of distribution? that have substantial 
amounts of cmlojr in more than three orientation*. 

MZ. SIZE; measurement of the mean and variance of the size parameter i defined 
in the primal sketch. 

M3, INTENSITY: measurement of the mean and variance of the lightness of I term 
in the primal sketch. 

M4. SPATIAL DENSITY; mean and variance of the nearest neighbor distances, end 
possibly trie mean second- nearest neighbor distance. There is no computational 
problem In obtaining these measures. 



Texture Vision 

There are three parts to the problem of texture vision. How 
does one discriminate between textures, and hence form regions from texture 
differences? How does one describe the shapes and dispositions of the re-gram w 
obtained'.' Ahd finally, how does one interpret a texture, in the sense of 
understanding the structure of the surface that gave rise to it? Only the first ot 
these will be dealt with here. 

There are several current ideas on texture proeasslnjr, Some 
authors have used Fourier techniques, and in certain circumstances, the speUsJ 
power spectrum can successfully separate different regions (Bae jay 1072). Others 
have constructed specialized operators which when applied to an image aomatirries 
discriminate between regions with different texture Probably the earliest 
example of this was the Roberts gradient (Roberts 1963). The most interesting 



FIGURE 9 








9. ExampJas of ""standard confk|urationa" that w* have found it usciul tu recognise 
The reader will probacy perceive them relali^s to a vertical! Axis. The VEE 
shown in gf Is used in figure 15e. 



FIGURE 10 





10. The measure of the overlap □( two adjacent, parallel lines depend? on en 
external angle, Iheta. In 10a^ theta Is 50 degree*, which is the value al which 
iteration: begin*. 



and comprehensive proposal is due to Julesz, Frisch, Gilbert and Shepp (1973), 
[see also Jul esz {19 75)], who showed that visual textures that differ only in thak 
Ihird or higher order statistical structure are rarely perceptually discMminabla; 
whereas visual textures thai differ in Iheir first or second order slalistics tin 
almost always be distinguished. The important paint about this linding lief in lb 
demonstration of the essential simplicity of texture processing. Although it gives 
no insight into the exact nature of the processing, it does imply thai all coefficients 
of third and higher-order terms id it? Vol terra series expansion are zero. 

We he*e now raached the core of this article. We saw in the 
last section (hat certain computational facilities exist and ore deployed during our 
reading of certain kinds nf drawings. The lacilities were summarized as processes 
Pi -3 and Al-4 on page 14. It is, of course, possible that their existence is no 
more than a happy accident, which fortuitously allows us to interpret Ihe idle 
stribblings of the artistically gifted TJia_cenira l.theas pi I his a rticle is that these 
processes a re available precisely bec a use they are needed to"heip interpret the 
primal sketchy and furthermore tha i these symbolic processes , together wtt hjfrrst- 
ordar discrimina tions based on the measures MCM de f ined on "pag e a 5, are 
sufficie nt to account for tha range of texture discriminations ol which we are 
capable, within the class of images to which thFs article is restrict ed. In other 
words, texture vision is actually implemented not by second-order operations on 
the image, but by first order discriminations, together with a small number ol 
grouping operations, aeling on the primal sketch of the image. Jul esz (1975 p43) 
mentioned in an aside the possibility that texture vision may rest on "first-order 
statistics of various simple feature extractors", but this idea requires the concepts 
of the primal sketch and of the aggregation primitives before it can be brought to 
fruition. 

So that the reader may form an intuitive grasp of Ihe central 
thesis, let as re-examine two of the textures devised by Jufesz, end follow this 
with some examples of the texture analysis run on the images whose primal 
sketches we saw earlier. Firstly, consider figure 1 1. Jutesz notes that in 1 la, the 
two regions have distinct second-order statistics, but not In figure I lb. Hence, 
according to his rule, the two regions are distinguishable in 11a, but not in ] lb, 
Mow consider our new explanation of this. Orientation measures are the onty 
distinguishing feature of the primal sketch representation, because everything else 
has carefully been held constant. In 1 lb, the two basic elements are related by a 
ISO degree rotation, end so the orientation statistics to which they give rise, era 
identical. Hance the two regions are ^distinguishable. En Lie however, there is 
more contour at degrees than at 90 degrees in [he central patch, but the 
opposite is true in the surround. Hence the two regions are immediately 
distinguished. 

The second example appears as figure Lie. Soma of the modules 
in the pattern have been reflected about a Vertical line through their centers. 
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COARSE IMAGE fiESCftEPTORS 

(uied tn primary corn re J of texture *n*1y«Si) 

Orientation Bucket? are 15° wtde 



15 JO 4S 60 7J jd IQ5 IfO 1 3-5 ISO 165 



NUM.Btft OF 
ITEMS 



It 11 



3G 



n 



TOTAL C0NTCWJR 
LENGTH 



258 26* 15 25 H ID 990 3* 23 46 25 207 



12. I 2a gives a rendering of the pj-imBl sketch, af the- image e( ligyre 1 3 . 12b 
shows tome measures m^pe on it, Theta ajjgregalion ha? decoded the teKlura that 
is present, and the aggregates ere displayed as the mosaic 12c 
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13, 13b shawi a rendering of the primal sketch cf 13a. 13c gives the associated 
orientati on- dependent statistics. The prfedemmsnce &l item? p! 60 degrees caus-es 
thetB-aggregati&ri to be alterti^ed at this orientiilicn, The default selling of thsta 
produce? the aggregate 13d From this, thete k found, ar-J the aggregation protest 
then extracts the stripes successttjlly {13e - i). 



Their second -order statistics are therefor* different This is an example in 
which Julesz's generalization fails. 

The statistics of the orientations of the contours ere however unchanged In 
this particular instance, because only vertical and horizontal orientations are 
invoked Hence the present theory predicts that the two regions are in fact 
Indistinguishable, 

Now Let us look at seme reel images. Figure 1 2a shows the 
primal Sketch of the chair whose image appeared as figure la, and figure 12b 
gives some of its orientation statistics, The first thing to realize about this image 
ie that it is textured at all. The texture is so simple that cue easily overlooks IL 
Vet the texture exists in exactly the tense of this article, and the process that 
succeeds in decoding it is theta- aggregation Figure 12c shows the results of 
running the thets- aggregation procedures on this image, and each element in the 
mosaic contains just one aggregate, 

We sea (ram this example a gNmmer of the power of texture 
vision, Using one knowledge- free technique, we have separated the chair from Its 
background, and also separated the proof em of divining the overall three- 
dimensional shape of the chair from the Analysis of its surface properties. Each of 
the aggregates can be described simply by position, orientation, and extent; and 
this produces a skeleton of the outline of the chair, By considering separately the 
structure of just one aggregate, One could go on to compute « description of the 
surface structure of the material out of which the chair is made. 

The next example showa a more difficult case of thela- 
aggregation. The image is taken from Brodati (1972, plate LJI IS, and the intensity 
values are shown in 13a. Figure 1 3b shows an approximation to the primal- sketch 
Contours of all intensities, lengths, and orientations are shown, and as one would 
expect from an image of this complexity, 3 3b has a somewhat messy appearance. 
Figure 13c gives statistical information about this image, from which it is evident 
that items at an orientation of around 60 degrees are strongly predominant. The 
average length of items at this orient a" ion is 13. These coarse measures cause 
the texture analyzer to attempt to group the edges at this orientation, Initially, 
the direction in which grouping should take place is unknown, so a default of 150 
dags (- SO + 90) is assumed, and stringent grouping parameters are used This 
leads to the primary cluster shown in figure 1 3d. From this, the correct direction 
is obtained (-83 degs)j and the duster process then groups the items into the 
stripes shown in I3e, i, g i hj and i. This completes primary texture processing, 
Qnte the primary stripes have been obtained., another stage of theta-aggregation 
serves to relate the stripes to ena-another. Notice that in this image, some of the 
stripe information has been picked up directly from intensity values (see figure 
13b), This would not be true of a more herring-bone texture, and the analysis 
does not depend upon it, Our present system is successful at processing herring- 
bone textures of similar complexity in which the two types of stripe have the 



FIGURE 14 
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14. Curvilinear aggregation operating on the prims] tkelch shown im figure 5c 
pruduced the element* ]4a, b ft t . Ores larger unit, have been obtained the 
governing parameters can be relaxed, imd the elliptical form (I4d) is obtained At 
thi$ point, the system is unaware oHts chape. 



15, This image at a toy bear (L 5b) has the primal sketch illustrated in 15b. The 
three principal forme extracted from l&b appear in 15c, d & e. The iterm In 15» 
ere classed as BLGBs, and the configuration tlial they form is recognised as a VEE 

figure 9f) with modifier FLAT- The axis relative to which this description wa* 
Computed is the vertical {default vaJue), 
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spine average reflectance. 

Next, we give an example of a simple kind of curvilinear 
aggregation. The local elements at the primal sketch of the cylinder shown in 
figure 5 are grouped using tight, conservative techniques into the units shown in 
figure 14a f 14b, and 14c. Tliase are then gathered using slightly weaker 
constraints into the form shown in Ifld. Notice that the contrast across the top* 
left portion of the Icrm has the opposite sign from the contrast elsewhere. 
Curvilirear aggregation depends on local information about how welt two adjacent 
segment* match^ and on global information that includes for example whether the 
complete form is closed. The global measures can affect the local choice of 
segment in those inlrequent cases where no candidate is to be preferred on 
purely local grounds (set Marr 1976). 

Finally, an example of several types ol analysis appears in the 
image of a toy bear (figure 15a). The primal sketch appears in 15b. The contours 
of his face and muzzle appear in 15c and 15d, and the three blobs that come from 
buttons that stand for his eyes and note appear in LSe, The three blobs define 
three places, which in turn provoke a specific configuration description relative to 
the default axis., which is the vertical. 

The examples given here do not prove the central thesis of this 
article, This will need to be tested by experimenting with considerably more 
images than the twenty or so with which we have dealt hitherto. But they give us 
grounds far beJieving it to be a reasonable theory of the computational mechanisms 
that underlie texture vision and the separation of figure from ground, A mora 
complete report is. in preparation Mwr 1976). 

The influence cf higher-level knowledge and of purpose an visual 
information processing 

Perhaps the most novel aspect cf these ideas is the notion that 
the primal sketch exists as a distinct and circumscribed symbolic entity, computed 
autonomously from the image h and operated on by a number of rocaf geometrical 
processes, semi-local measures, and first- order discriminations. In a computational 
sense, the primal sketch is a very active structure. The information written into it 
depends on the image, but lurking active in Its fabric lie several highly abstract 
geometrical and statistical processes. It is the direct analog for the ctass of images 
studied here of the Cyclopean retina that Juleaz U9?l) wrote of for binocular 
vision. More subjective y, it corresponds very closely to the "image* 1 that one is 
conscious of. This reflects the computational hypothesis that all subsequent 
analyses reads the primal sketch, not the date from which It was computed. The 
primal sketch therefore acts in a genuine sense as the Interface at which visual 
analysis becomes a purely symbolic affair, 



If it turns out to be true that tenure vision is successfully 
implemented by approximately the set of preces&es that has been defined in thle 
Article, it will mean thai visual "forms,* 1 can usually be extracted from the image by 
using knowledge-Ire e techniques. In other words, the extraction of a visual form 
can usually precede its description. From lhi 9 It follows that It fs usually easy to 
compute a cpa/se d escription of a form, 

ft is difficult to overstate the importance of this for determining 
the structure of subsequent recogrution processes. It means thai one can see the 
ihape of the forest without first computing detailed descriptions of alt the treat; 
that one can compute the cluster of blebs that forms a distant village 
independently of decides that some of those blobs are actually buildings. In the 
more mundane example of figure 1 5>, one can compute that the overall shape of 
the top form is roughly ovwdal without Hrsl having to segment out and describe 
separately the bumps that are the bear's ears. Furthermore, It suggests that the 
role of higher level knowledge in this process is not only very restricted, but is 
also different in kind from its intervention in programs like Shiral T i (J 973). It does 
not affect the Une-finoing stage ((he computation of the primal sketch) at all. Ite 
most usual modus ooe randj is in choosing which processes are to be used to- reed 
the primal sketch — for example by specifying which texture predicate should be 
used on the image to select the parts of current interest, it can also appfy certain 
limited kinds of flags to critical segments during their aggregation into forms. The 
coupling between higher-level knowledge and the lorm-extraction processes is 
however much weaker than the coupling between the different form-extraction 
processes. 

It is clearly desirable to have some control over which of the 
possible forms in a figure should be delivered at a given moment from the primal 
sketch. For example, in the image BEAR there are three possible major forms; the 
outline of the head, the muzzle, and the three blobs that represent his, eyes and 
nose. It seems probable that pnly one of these should be made available at a time, 
and this in turn raises interesting questions about the order in which it is done, the 
way in which the three terms and their relative positions are described, and the 
way in which those descriptions trigger a larger datastructgre and arB absorbed by 
it, In living systems, which are powerful enough to operate in real time, the 
control of the direction of gaze may be rather closely related to the order in which 
these events take place. 
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